Exploring automatic word sense disambiguation with decision lists and the Web

نویسندگان

  • Eneko Agirre
  • David Martínez
چکیده

The most effective paradigm for word sense disambiguation, supervised learning, seems to be stuck because of the knowledge acquisition bottleneck. In this paper we take an in-depth study of the performance of decision lists on two publicly available corpora and an additional corpus automatically acquired from the Web, using the fine-grained highly polysemous senses in WordNet. Decision lists are shown a versatile state-of-the-art technique. The experiments reveal, among other facts, that SemCor can be an acceptable (0.7 precision for polysemous words) starting point for an all-words system. The results on the DSO corpus show that for some highly polysemous words 0.7 precision seems to be the current state-of-the-art limit. On the other hand, independently constructed hand-tagged corpora are not mutually useful, and a corpus automatically acquired from the Web is shown to fail.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hierarchical Decision Lists for Word Sense Disambiguation

This paper describes a supervised algorithm for word sense disambigua-tion based on hierarchies of decision lists. This algorithm supports a useful degree of conditional branching while minimizing the training data fragmentation typical of decision trees. Classiications are based on a rich set of collocational, morphological and syntactic contextual features, extracted automatically from traini...

متن کامل

Emergent Linguistic Rules from inducing Decision Trees: Disambiguating Discourse Clue Words

We apply decision tree induction to the problem of discourse clue word sense disambiguation. The automatic partitioning of the training set which is intrinsic to decision tree induction gives rise to linguistically viable rules.

متن کامل

Unsupervised Word Sense Disambiguation Using The WWW

This paper presents a novel unsupervised methodology for automatic disambiguation of nouns found in unrestricted corpora. The proposed method is based on extending the context of a target word by querying the web, and then measuring the overlap of the extended context with the topic signatures of the different senses by using Bayes rule. The algorithm is evaluated on Semcor 2.0. The evaluation ...

متن کامل

SWAT-MP: Supervised WSD and Affective Text Tagging

In this paper, we describe our Word Sense Disambiguation system for SEMEVAL-1 task 5: Multilingual Chinese-English Lexical Sample Task. We implement methods based on Bayesian calculations, cosine comparison of word-frequency vectors, decision lists, and Latent Semantic Analysis. We also implement a simple classifier combination system that combines these classifiers into one WSD module. The res...

متن کامل

رفع ابهام معنایی واژگان مبهم فارسی با مدل موضوعی LDA

Word sense disambiguation is the task of identifying the correct sense for the word in a given context among a finite set of possible sense. In this paper a model for farsi word sense disambiguation is presented. The model use two group of features: first, all word and stop words around target word and topic models as second features. We extract topics from a farsi corpus with Latent Dirichlet ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره cs.CL/0010024  شماره 

صفحات  -

تاریخ انتشار 2000